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Abstract. The sequence of the replicase gene of porcine epidemic diarrhoea virus (PEDV) has been determined. 
This completes the sequence of the entire genome of strain CV777, which was found to be 28,033 nucleotides (nt) 
in length (excluding the poly A-tail). A cloning strategy, which involves primers based on conserved regions in the 
predicted ORF1 products from other coronaviruses whose genome sequence has been determined, was used to 
amplify the equivalent, but as yet unknown, sequence of PEDV. Primary sequences derived from these products 
were used to design additional primers resulting in the amplification and sequencing of the entire ORF1 of PEDV. 
Analysis of the nucleotide sequences revealed a small open reading frame (ORF) located near the 5' end (no 99- 
137), and two large, slightly overlapping ORFs, ORFla (nt 297-12650) and ORFlb (nt 12605-20641). The 
ORFla and ORFlb sequences overlapped at a potential ribosomal frame shift site. The amino acid sequence 
analysis suggested the presence of several functional motifs within the putative ORF1 protein. By analogy to other 
coronavirus replicase gene products, three protease and one growth factor-like motif were seen in ORFla, and one 
polymerase domain, one metal ion-binding domain, and one helicase motif could be assigned within ORFlb. 
Comparative amino acid sequence alignments revealed that PEDV is most closely related to human coronavirus 
(HCoV)-229E and transmissible gastroenteritis virus (TGEV) and less related to murine hepatitis virus (MHV) and 
infectious bronchitis virus (IBV). These results thus confirm and extend the findings from sequence analysis of the 
structural genes of PEDV. 
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Introduction 

Porcine epidemic diarrhoea virus (PEDV) is a 
causative agent for diarrhoea in pigs, particularly in 
neonates. The disease has been recognised for 
approximately thirty years, but the causative virus 
was only first described in 1978 LI], while another ten 
years elapsed before a method was developed for 
propagation of the virus in cell culture [2], During 
this time, outbreaks of the disease were reported from 
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numerous European countries as well as Korea, 
China and Japan. The epidemiology and pathogenesis 
of the disease have been well described by Pensaert 
[3], The biological behaviour, electron microscopic 
appearance and polypeptide structure of PEDV 
resulted in its provisional classification as a corona¬ 
virus [2,4,5]. 

Coronaviruses belong to the taxonomic order of 
Nidovirales and contain a single stranded RNA 
genome of positive polarity, which is approximately 
thirty kilobases in length. The genes encoding the 
structural proteins are located at the 3' end of the 
genome. An astonishing two-thirds of the genome 
consist of the replicase gene, which is located at the 
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5' end of the genome. The replicase proteins are 
encoded by ORFla and ORFlb. These two long, 
slightly overlapping ORFs are connected by a 
ribosomal frame shift site in all coronaviruses 
sequenced to date. This regulates the ratio of the 
two polypeptides encoded by ORFla and the read- 
through product ORFlab. About 70-80% of the 
translation products are terminated at the end of 
ORFla, and 20-30% continue to the end of ORFlb. 
The polypeptides are post-translationally processed 
by viral encoded proteases [reviewed by 6]. These 
proteases are encoded within ORFla; the polymer¬ 
ase- and the helicase-function are encoded by 
ORFlb. 

We have previously completed the sequencing of 
the nucleocapsid- (N), membrane- (M), small mem¬ 
brane- (E), ORF3 and spike- (S) genes of the PEDV 
strain CV777 [7-9]. The alignment of the deduced 
amino acid sequences indicated that PEDV occupies 
an interesting intermediate position between the two 
well-characterized members of the group I corona- 
viruses, transmissible gastroenteritis vims (TGEV) 
and human coronavirus (HCoV)-229E. In this study, 
we have continued to determine and analyse nucleo¬ 
tide sequences of PEDV. To our knowledge, only two 
group I coronaviruses have been sequenced comple¬ 
tely, HCoV-229E and TGEV [10,11]. In addition, 
two strains of mouse hepatitis virus (MHV), JHM and 
A59 belonging to the group II coronaviruses, and 
infectious bronchitis virus (IBV) have been comple¬ 
tely sequenced [12-15]. Therefore, the sequence 
presented in this paper is the sixth sequence of a 
coronavirus covering the entire genome. 


Materials and Methods 

Growth of Virus and Preparation of Viral RNA 

Growth of cell adapted PEDV strain CV777 was 
performed essentially as has been described else¬ 
where [2,8], except that virus-infected cells were 
harvested at approximately 18 h post infection. Cells 
were freeze-thawed three times and cell debris 
removed by low speed centrifugation. Virus was 
pelleted by centrifugation for 2h at 22,000 rpm and 
4°C in a SW28 rotor of a Beckman centrifuge. Vims 
pellets prepared from two 175 cm 2 flasks were pooled 
and resuspended in 1 ml Trizol™ (Gibco-BRL), and 


RNA was prepared as recommended by the manu¬ 
facturer. 

cDNA Synthesis and PCR Amplification of 
Viral Sequences 

In order to obtain the first partial PEDV specific 
sequences, the predicted amino acid sequences of the 
FICoV-229E and TGEV polymerase ORFs were 
aligned and homologous regions identified. The 
homologous regions were used to design degenerate 
primers [9] that were used for RT-PCR amplifica¬ 
tions. These initial amplicons were cloned and 
sequenced [9]. Later, a mixture of up to six anti¬ 
genome sense primers based on PEDV specific 
sequences or the degenerate primers and random 
hexamer primer (purchased from Schmidheini AG; 
Balgach, Switzerland) was used for first strand cDNA 
synthesis. RNA prepared from two 175 cm 2 flasks 
of virus-infected cells was denatured for 10 min at 
65° C and first strand cDNA was performed in a 
20 pi total reaction volume using Superscriptll™ 
(GibcoBRL; Basel, Switzerland) according to the 
manufacture’s protocol. This was modified to create 
the longer reverse transcription products by includ¬ 
ing a denaturation step at 95° C for 5 min following 
the first 1 h incubation at 42°C, followed by the 
addition of 1 pi Superscriptll™ and a second pro¬ 
longation step of 1 h at 42°C. Template RNA was 
digested by adding 1 pi RNaseH (GibcoBRL; Basel, 
Switzerland) to the reaction mix and incubating at 
37° C for 20 min. 

PCR amplification was performed as des¬ 
cribed elsewhere. In brief, Pfu DNA polymerase 
(Stratagene; Basel, Switzerland) was used for the 
amplifications, which were performed on a DNA 
Engine (MJ Research) machine. PCR fragments were 
subsequently cloned into pBluescripf e II KS+ or 
pUC19 vectors using standard procedures. The 
nucleotide sequence was determined on these 
cDNA clones. Direct sequencing was performed on 
a RT-PCR product (see Fig. IB), which was cleaned 
through an agarose gel. The contigs of the sequence 
determinations were constructed using SeqMan 
(DNA*, Lasergene, Madison WI, USA). We pre¬ 
viously reported the determination of the PEDV 
leader sequence on the mRNA encoding the N-gene 
[16]. This sequence was used for the primer design in 
order to amplify the 5' end of the genome. The leader 
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Fig. 1. Schematic presentation of the PEDV genome and map of cDNA clones. (A) The open reading frames (ORFs) are represented as boxes. 
The following domains located in ORFlab are shown: papain-like proteinase (Pip), X domain (X), poliovirus 3C-like proteinase (3C1), growth 
factor-like domain (Gfl), RNA-dependent RNA polymerase (RdRp), metal ion-binding-domain (Mb), and helicase (Hel). An amino acid scale is 
shown at the bottom. (B) The cDNA clones and the RT-PCR product used for the determination of the nucleotide sequence presented in this 
paper are shown as located on the genomic RNA of PEDV. The upper part of the figure shows the initial RT-PCR products amplified with 
degenerate primers designed from conserved coronavirus sequences. The lower part shows RT-PCR products amplified with primers based on 
PEDV sequences of initial cDNA clones. Clones are depicted as lines, the RT-PCR product which was sequenced directly as a two-headed 
arrow. A nucleotide scale is shown at the bottom. 


sequence was used for the in silico construction of 
the genomic RNA sequence, which is available 
on GenBank database (Accession Number 
AF353511). 

Sequence Analysis 

Virus sequences covering replicase genes were 
obtained from the GenEMBL sequence database. 
The files with the accession numbers X69721, 
Z34093, AF029248, and M95169 for HCoV-229E, 
TGEV (Purdue 115), MHV-A59, and IBV (Beaud- 
ette) respectively were used. 

The deduced amino acid sequences were com¬ 
pared as indicated in the text using PILEUP and GAP 
(GCG Package version 10.0; Madison, WI, USA). 
The files generated by PILEUP were used in 
DISTANCES (GCG Package version 10.0; Madison, 
WI, USA) to determine the Kimura protein sequence 
distances, which were subsequently used for the 


construction of unrooted dendrogram using TreeGen 
on the CBRG server (http://cbrg.inf.ethz.ch/) 

Results and Discussion 

Cloning Strategy 

The cloning approach we used previously to clone the 
PEDV M and N genes involved designing primers 
based on conserved regions of the coronavirus M and 
N genes to amplify the equivalent to the unknown 
PEDV sequence. In this study, we employed this 
technique to clone parts of the ORF1 of PEDV. Such 
a method is useful for viruses which do not grow to 
high titre, avoids lengthy screening of clones and 
could potentially be applied to the cloning of any 
group I coronavirus. However, the large size of ORF1 
and the paucity of sequence data from other 
coronaviruses made this an ambitious objective. A 
number of conserved functional domains were 
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identified in the predicted ORF1 products, but these 
domains are mainly located in the ORFlb region and 
leave large regions of the ORFla product with no 
known function and only a low level of sequence 
conservation between different coronavirus genomes. 
In order to clone and determine the sequences for the 
PEDV ORF1, the predicted amino acid sequences of 
the HCoV-229E and TGEV ORF1 were aligned and 
homologous regions identified. The HCoV-229E and 
TGEV ORFs were sufficiently closely related to 
allow complete alignment of the predicted expression 
products. In contrast, the MHV and IBV sequences 
were much more divergent, and could only be aligned 
with the group I sequences in some of the conserved 
regions. Degenerate primers were designed from 
regions conserved between the HCoV-229E and 
TGEV and, where possible, MHV and IBV ORF1. 
These primers were used both to prime reverse 
transcription and for the PCR amplifications. 
Sequence data derived from these PCR products 
allowed us to design sequence-specific primers which 
were then used to amplify the entire ORF1 (see 
Fig. IB). 

Analysis of the Nucleotide Sequence, Prediction 
of ORFs 

Numerous small cDNA clones, five large cDNA 
clones and one RT-PCR product covering the 5' two- 
thirds of the PEDV genome were used to determine 
the nucleotide sequence of the PEDV ORF1 (Fig. 1). 
This analysis completes the nucleotide sequence of 
PEDV, and thereby the sixth entire sequence deter¬ 
mined from a coronavirus genome [10-13,15]. The 
genome of PEDV (CV777) excluding the poly A-tail 
is 28033 nt in length. 

Analysis of the newly determined nucleotide 
sequence revealed a pattern of ORFs typical of 
coronaviruses. A small ORF with the potential to 
code for a 12-amino acid peptide was found at the 5' 
end of the genome from nucleotide position 
99-137. Such small ORFs (uORFs) are present in 
all coronaviruses sequenced so far. The uORFs of 
HCoV-229E [17] and IBV [15] are found to be 
eleven codons in length, while that of MHV is eight 
codons long [18,19]. That of TGEV can only encode 
a three-amino acid peptide [20]. Two long ORFs of 
12354 and 8037 nt, which overlap by 46 nt, covered 
most of the newly determined sequence. By analogy 
to published coronavirus sequences [15,17,20], the 


ORFs were designated ORFla and ORFlb. The 
predicted ORFla of FEDV extended from nucleotide 
297 to 12650. This resulted in a 4117-codon ORF. 
The overlapping ORFlb starting at nucleotide 12605 
and ending at nucleotide 20641 had the capacity to 
code for 2678 amino acids. 

It has been proposed for coronaviruses and other 
members of the order Nidovirales [21] that the 
nucleotide sequences in the overlapping regions of 
ORFla and ORFlb are able to fold into a pseudoknot 
tertiary structure [22,23], This region allows the 
ribosome shifting of the reading frame during 
translation of the ORFla and subsequently continues 
the translation in ORFlb. The function of these RNA 
structures as ribosomal frame shift sites was demon¬ 
strated for the analogous sequences of IBV [24] and 
HCoV-229E [25], It seems likely that the translation 
of the PEDV ORFlb is mediated by such a ribosomal 
frame shifting. The nucleotide sequences of PEDV, 
HCoV-229E, and TGEV covering the ribosomal 
frame shift site are more conserved to each other 
than to MHV-A59 or IBV. In order to identify the 
sequence which could be involved in the formation of 
the tertiary structure, the nucleotide sequences cover¬ 
ing the end of ORFla and the beginning of ORFlb 
from HCoV-229E [25] and TGEV [20] were aligned 
with the corresponding sequence of PEDV. Fig. 2A 
shows the predicted frame shift region of PEDV 
based on this comparison. The so-called slippery site 
(UUUAAAC) at which frame shifting occurs is 
identical in all coronaviruses sequenced so far. 
The stems and loops required to provide the tertiary 
structure of the frame shift regions of TGEV and 
HCoV-229E were compared and Fig. 2B shows the 
predicted tertiary structure required for the frame 
shift of PEDV based on this comparison. 

Amino Acid Sequence Comparison 

Pairwise comparison of the deduced amino acid 
sequences (using GAP) revealed that ORFlb of 
PEDV is more conserved than ORFla to correspond¬ 
ing sequences of other coronaviruses. The percentage 
of similarities and identities is shown in Table 1. The 
putative protein sequence of ORFla was most similar 
to the sequence of ORFla of HCoV-229E (59.4%) 
and less similar to the corresponding ORFla of 
TGEV (52.1%), MHV-A59 (39.5%) and IBV 
(38.7%). The same relationship, but at a higher 
level of similarity, was true for the deduced amino 
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Fig. 2. The putative ribosomal frame shift region. (A) Nucleotide sequence covering the pseudoknot structures of HCoV-229E and TGEV 
aligned to the corresponding sequence from PEDV using DNA . The putative slippery sites are underlined, the stems are boxed and the loops 
are indicated by two headed arrows. (B) The putative tertiary structure of PEDV is predicted from the sequence alignment from the upper 
part of the figure. 


acid sequence of the predicted PEDV ORFlb. It was 
most similar to the amino acid sequence of HCoV- 
229E ORFlb and TGEV ORFlb (83.2% and 80.3%, 
respectively). The similarity to the ORFlb from 
MHV-A59 and IBV was around 64%. The deduced 
amino acid sequences of ORFla and ORFlb from 
PEDV were aligned with the corresponding 
sequences of HCoV-229E, TGEV, MHV-A59, and 
IBV using PILEUP. The degrees of amino acid 
homologies are graphically presented as dendrograms 
(Fig. 3A,B). 

The multiple sequence alignments revealed sev¬ 
eral putative functional domains common to corona- 
virus sequences [23,26] located on the deduced 
amino acid sequence of ORFlab of PEDV. Some of 
these had been used to design the primers for the RT- 
PCR amplification. In the ORFla region the follow¬ 
ing motifs were observed. Two motifs indicative of 
papain-like proteases (Pip) were present at amino 
acid positions 1077-1266 and 1716-1917. The Pip 
motif is found twice in the replicase genes of HCoV- 


229E, TGEV and MHV, but only once in that of IBV. 
In this respect, PEDV resembles HCoV-229E, TGEV 
and MHV rather than IBV. A highly conserved 
region (X-domain) was found between the two Pip 
motifs. Despite this motif being present in all 
coronavirus sequences, its function is not yet 
known. A picornavirus 3C-like (3C1) protease 
domain is located between amino acids 2998 and 
3299 of the PEDV ORFla. All corona- and arter- 
iviruses encode this motif, which is the main protease 
for the coronavirus mediated processing of the 
polyproteins. Three markedly hydrophobic domains 
conserved among coronaviruses are found in ORFla. 
The first is located after the second Pip motif and the 
others flank the 3C1 motif. Finally, a growth factor¬ 
like (Gfl) domain was located close to the end of 
ORFla (amino acid position 3965-4000). In the 
ORFlb region, three structural protein motifs could 
be recognized, which all play a role in viral 
replication. A sub-sequence at amino acid position 
4636-4939 containing the characteristic tripeptide 
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SDD (or GDD in most RNA viruses) [26] is probably 
the active site for the RNA dependent RNA 
polymerase. A metal ion-binding domain covering 
amino acids 5027-5103 and a helicase motif at amino 

Table 1. Percentage of similarity and identity between ORF la and 
lb of Coronavirus sequences. The sequences were aligned GAP 
(GCG Package) using a blossum62 weight matrix and default 
settings (Gap Weight 8, Length Weight 2) 


Similarity 

HCoV-299E 

TGEV 

MHV-A59 

IBV 

la 





PEDV 

59.4 

52.1 

39.4 

38.7 

HCoV-229E 


53.1 

37.9 

38.0 

TGEV 



37.6 

39.6 

MHV-A59 




38.8 

lh 





PEDV 

83.2 

80.3 

64.5 

64.3 

HCoV-229E 


79.9 

63.4 

63.6 

TGEV 



64.9 

64.6 

MHV-A59 




65.7 

Identity 

HCoV-299E 

TGEV 

MHV-A59 

IBV 

la 





PEDV 

50.1 

42.9 

30.9 

29.8 

HCoV-229E 


43.2 

28.9 

28.9 

TGEV 



28.1 

30.2 

MHV-A59 




30.0 

lh 





PEDV 

77.8 

73.3 

55.8 

55.4 

HCoV-229E 


72.7 

54.6 

54.5 

TGEV 



55.5 

54.9 

MHV-A59 




57.1 


acid positions 5309-5624 were also observed in the 
PEDV ORFlb product. Alignments of the deduced 
amino acid sequences of the 3C1 protease and the 
polymerase motif from five different coronaviruses 
are shown in Fig. 4A and 4B, respectively. The 
findings concerning conserved domains are sum¬ 
marised in Fig. 1A. 

A deletion of about 180 amino acids located 
between the X-domain and the second Pip motif in 
the putative ORF la sequence of TGEV compared to 
that of HCoV-229E was reported by Eleouet et al. 
[20]. This additional sequence was present in the 
PEDV ORFla product. The alignment (using GAP) 
of the HCoV-229E and PEDV amino acid sequences 
revealed 42.5% similarity and 31.5% identity in this 
region. 

Conclusion 

Earlier sequence analysis of PEDV based on the 
structural protein sequences has shown that PEDV is 
most closely related to HCoV-229E and TGEV [7- 
9,27], less related to MHV-A59, and least related to 
IBV. However, it was not possible to determine the 
relative similarities of HCoV-229E, TGEV and 
PEDV. In this study, the similarities and identities 
of the amino acid sequence alignments based on 
ORFla and ORFlb show clearly that PEDV is most 
closely related to HCoV-229E and, moreover, that 


A HCOV- 229 E B HCOV- 229 E 




Fig. 3. Phylogenetic trees generated using GenTree (CBRG server). Unrooted dendrogram showing the Kimura’s distances of (A) ORFla 
and (B) ORFlb. 
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V-L-K-FS-MILSDDGWCYN-A—G—A-F-LYYQN-VFM KCW-E-D GPHEFCSQHT-YLPYPDPSRIL-A—FVDD—K 


WEYYGYLRKHFSMMILSDDGWCYNNDYASLGYVADLNAFKAVLYYONNVFMSASKCWIEPDINKGPHEFCSOHTMOIVDKEGTYYLPYPDPSRILSAGVFVDDWK 

VDDFYGYLQKHFSMMILSDDSWCYNKTYAGLGYIADISAFKATLYYQNGVFMSTAKCWTEEDLSIGPHEFCSQHTMQIVDENGKYYLPYPDPSRIISAGVFVDDITK 

WEYFSYLRKHFSMMILSDDGWCYNKDYADLGYVADINAFKATLYYONNVFMSTSKCWVEPDLSVGPHEFCSOHTLOIVGPDGDYYLPYPDPSRILSAGVFVDDIVK 

VSEYYEFLNKHFSMMILSDDGWCYNSEFASKGYIANISAFQQVLYYQNNVFMSEAKCWETDIEKGPHEFCSQHTMLVKMDGDEVYLPYPDPSRILGAGCFVDDLLK 

VEKFYSYLCKNFSLMILSDDGWCYNNTLAKQGLVADISGFREVLYYQNNVFMADSKCWVEPDLEKGPHEFCSQHTMLVEVDGEPKYLPYPDPSRILGACVFVDDVDK 


Fig. 4. Amino acid alignments of five coronavirus sequences covering the 3C like protease (A) and the RNA dependent RNA polymerase-motif 
(B). The consensus sequence of residues conserved in all sequences is shown on the top of the alignments and marked by asterisks. The 
conserved SDD motif in polymerases is underlined. The locations of the aligned sequences relative to the start of the PEDV fusion protein 
are shown for both motifs. 


HCoV-229E is more similar in sequence to PEDV 
than it is to TGEV. 

In addition to the sequence analysis, the presented 
work offers various possibilities for future research 
on coronaviruses. Functional analysis and processing 
of the as yet uncharacterised PEDV ORF1 is now 
possible. Recently, Almazan et al. and Yount et al. 
achieved the generation of infectious TGEV from 
cDNA [28,29] and Thiel et al. suceeded in 
generating full length cDNA clones of HCoV- 


229E and IBV in a recombinant vaccinia virus 
system [30]. The sequence and the cDNA clones 
covering the entire genome of PEDV would allow 
the development of a mini-genome system to study 
viral replication or the generation of an assembled, 
infectious cDNA clone. Bearing in mind the close 
relationship of PEDV and HCoV-229E, the latter 
approach could be used to exchange functional 
parts of these viruses to gain new insights into 
the biology of these viruses. Furthermore, the 
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generation of a PEDV infectious clone could allow 
the use of PEDV as a vaccine against TGEV. 
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